18 research outputs found

    GEML: A Grammatical Evolution, Machine Learning Approach to Multi-class Classification

    Get PDF
    In this paper, we propose a hybrid approach to solving multi-class problems which combines evolutionary computation with elements of traditional machine learning. The method, Grammatical Evolution Machine Learning (GEML) adapts machine learning concepts from decision tree learning and clustering methods and integrates these into a Grammatical Evolution framework. We investigate the effectiveness of GEML on several supervised, semi-supervised and unsupervised multi-class problems and demonstrate its competitive performance when compared with several well known machine learning algorithms. The GEML framework evolves human readable solutions which provide an explanation of the logic behind its classification decisions, offering a significant advantage over existing paradigms for unsupervised and semi-supervised learning. In addition we also examine the possibility of improving the performance of the algorithm through the application of several ensemble techniques

    KNN-LC: Classification in Unbalanced Datasets using a KNN-Based Algorithm and Local Centralities

    No full text
    International audienceClassification is one of the most central topics in machine learning. Yet, most of the algorithms that solve the classification problem operate under the assumption that the training datasets are balanced. While this assumption is reasonable for many classification problems, it is often not valid. For example, application domains such as fraud and spam detection are characterized by highly unbalanced classes where the examples of malicious items are far less numerous then the benign ones. This paper proposes a KNN-based algorithm adapted to unbalanced classes. The algorithm precomputes distances in the training set as well as a centrality score for every training item. It then weights the distances between the items to be classified and their K-nearest training neighbors, accounting for the distribution of distances in every class and the centrality (and outlierness) of neighbors. This reduces the noise from outliers of the majority class and enhances the weights of central data points allowing the proposed algorithm to achieve high accuracy in addition to high TPR in the minority class

    Developing New Fitness Functions in Genetic Programming for Classification With Unbalanced Data

    No full text

    Stochastic Semantic-Based Multi-objective Genetic Programming Optimisation for Classification of Imbalanced Data

    Get PDF
    Data sets with imbalanced class distribution pose serious challenges to well-established classifiers. In this work, we propose a stochastic multi-objective genetic programming based on semantics. We tested this approach on imbalanced binary classification data sets, where the proposed approach is able to achieve, in some cases, higher recall, precision and F-measure values on the minority class compared to C4.5, Naive Bayes and Support Vector Machine, without significantly decreasing these values on the majority class
    corecore